31 research outputs found

    Motif Discovery through Predictive Modeling of Gene Regulation

    Full text link
    We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algorithm, MEDUSA builds a motif model whose presence in the promoter region of a gene, coupled with activity of a regulator in an experiment, is predictive of differential expression. In this way, we learn motifs that are functional and predictive of regulatory response rather than motifs that are simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model of the transcriptional control logic that can predict the expression of any gene in the organism, given the sequence of the promoter region of the target gene and the expression state of a set of known or putative transcription factors and signaling molecules. Each motif model is either a kk-length sequence, a dimer, or a PSSM that is built by agglomerative probabilistic clustering of sequences with similar boosting loss. By applying MEDUSA to a set of environmental stress response expression data in yeast, we learn motifs whose ability to predict differential expression of target genes outperforms motifs from the TRANSFAC dataset and from a previously published candidate set of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed binding sites associated with environmental stress response from the literature.Comment: RECOMB 200

    Treatment-responsive pudendal dysfunction in chronic inflammatory demyelinating polyneuropathy.

    Get PDF
    Contains fulltext : 52470.pdf (publisher's version ) (Open Access

    Genomewide analysis of Drosophila GAGA factor target genes reveals context-dependent DNA binding.

    No full text
    The association of sequence-specific DNA-binding factors with their cognate target sequences in vivo depends on the local molecular context, yet this context is poorly understood. To address this issue, we have performed genomewide mapping of in vivo target genes of Drosophila GAGA factor (GAF). The resulting list of ≈250 target genes indicates that GAF regulates many cellular pathways. We applied unbiased motif-based regression analysis to identify the sequence context that determines GAF binding. Our results confirm that GAF selectively associates with (GA)(n) repeat elements in vivo. GAF binding occurs in upstream regulatory regions, but less in downstream regions. Surprisingly, GAF binds abundantly to introns but is virtually absent from exons, even though the density of (GA)(n) is roughly the same. Intron binding occurs equally frequently in last introns compared with first introns, suggesting that GAF may not only regulate transcription initiation, but possibly also elongation. We provide evidence for cooperative binding of GAF to closely spaced (GA)(n) elements and explain the lack of GAF binding to exons by the absence of such closely spaced GA repeats. Our approach for revealing determinants of context-dependent DNA binding will be applicable to many other transcription factors
    corecore